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In biological statistics it is often necessary to study the differences 
between two variable types. The problem may be exemplified by a 
consideration of the differences between the types represented by popu- 
lations of various countries, as, for instance, between the populations of 
Sweden, Switzerland, and Central Africa. It is obvious that the type 
of Sweden differs less from that of Switzerland than the latter differs 
from the type of Central Africa. Nevertheless, it is difficult to say 
just what is meant by greater or lesser difference in type. The at- 
tempt to establish and describe varieties of races according to character- 
istic features that are considered as significant from a morphological 
point of view, suffers, therefore, from a lack of clarity of concept. 

The differences between the averages of types have been utilized for 
the purpose of segregating subtypes of human races, as, for instance, 
in the classification of the local types into which the European race may 
be divided. Pigmentation, stature, form of hair, head, face, and nose 
have been so utilized. For example, local types have been described 
by Deniker * by assigning to each group peoples among whom certain 
average values of measurements are found. All those that have 
average statures, head indices, facial forms, nose forms, and pigmenta- 
tion falling within certain limits that may be expressed numerically, 
were assigned by him to a certain subrace. Although it is possible to 
give in this manner a definite description of local types, the biological sig- 
nificance of the observed differences remains undetermined. Obviously 
the classification obtained by the method here indicated will vary ac- 
cording to the limits which we set for each division. If we call tall 

1 The Races of Man. 
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those populations whose average stature is more than 170 cm., their 
assignment to a subdivision will not be the same as the one obtained 
when we call those tall whose average stature is more than 172 cm. If 
no valid reason can be given for the choice of one or the other limit, then 
the subtype so established can have a conventional descriptive mean- 
ing only. If we wish to establish a biologically significant classification 
we should have to prove that the descriptive features selected are 
morphologically significant. Furthermore, it would be necessary to 
distinguish between environmental and hereditary influences that 
determine the particular features which are made the basis of the 
classification. As a matter of fact, this study has never been made; 
and since the lines of demarcation between classes are arbitrary, these 
classes will be only a convenient schematic review of the distribution 
of certain selected combinations of descriptive features. 1 

In the following pages I wish to discuss the question whether a valid 
method of comparing closely allied forms can be found, so that arbi- 
trary classifications may be avoided and measurable differences between 
types established. 

It may seem that maps showing the distribution of a single feature or 
of combined features would give this information. Retzius' maps of 
Sweden, Livi's maps of Italy, Virchow's map of hair color in Germany, 
anthropological maps of France, England, and Spain, all illustrate the 
distribution of forms of the body, either by showing the areas in which 
the same average value of a measurement occurs, or areas in which 
certain selected values occur with equal frequency. The maps are 
intended to convey the impression that sameness of average values or of 
frequencies indicate the occurrence of the same racial forms. They also 
indicate that the differences between types are equivalent whenever 
the differences between averages or between frequencies of occurrence 
of selected values are the same. This has often led to the interpreta- 
tion that the values whose frequency is shown represent separate racial 
types. Thus, the frequency of long-headedness in an area is often said 
to mean that a long-headed race forms a certain proportion of the 
population, although no biological basis can be given for the claim that 
the arbitrarily selected values represent a separate racial type. 

The essential difficulty of our problem may be made clearer by the 
following considerations. Each racial type is variable. When we study 
the distribution of any particular feature, let us say of the cephalic 
index among European types, we find that the forms which occur in 
each area are variable, and the individuals composing the populations of 

1 See also St. Poniatowski, Ueber den Wert der Index Klassifikatwn, Archiv. ftir Anthropologic, 
N. F. Vol. 10, p. 1. 
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different areas show in part the same numerical values of the measure- 
ment. The distribution of forms in each population is such that the 
types overlap. The average cephalic index in Sweden may be 77; in 
Bavaria 85. Nevertheless, there will be many individuals that have 
the index 81, both in Bavaria and in Sweden; and according to this 
particular feature individuals may belong to either group. We know 
that if two regions are not too far apart, it is quite impossible to assign 
with certainty a single individual to either of them. 

If we assume for the moment that the variability found both in 
Sweden and in Bavaria were very low, so that the highest cephalic index 
occurring among Swedes would be not more than 80 and the lowest oc- 
curring in Bavaria not less than 82, then the two series would appear to 
us entirely distinct. It would be quite inadmissible to claim that the 
differences between the two pairs of groups were the same in the cases 
of greater variability (which has actually been observed) and the lesser 
variability (which has here been assumed), although in both cases the 
averages show the same differences. In the latter case we judge that 
the difference is greater, or perhaps better, more fundamental. 

Obviously our judgment is influenced by the degree of variability, 
still more, by the degree of overlapping of the two series. Only if we 
assume quite arbitrarily that the individuals that show the average 
values of the measurement in question — or some other selected value — 
were the true representatives of the whole population, and that all 
others were present only as foreign, intrusive elements, or if their oc- 
currence represented modifications of the typical form due to extraneous 
causes — only under these conditions could we say that the difference 
between the selected values represents the difference between the types. 
A concept of variability like the one involved in these assumptions is, 
however, quite inadmissible. The group must be considered as a class 
and its variability be determined by the definition of the class in ques- 
tion. Our detailed study of the class will always be directed toward 
the discovery of new principles of classification by means of which 
subclasses are formed whose variability will be less than that of the 
original class. In this way we try to define the newly formed sub- 
classes more sharply than the original class, and the advance in our 
knowledge consists in the discovery of the factors that make the sub- 
classes more determinate. It would be quite arbitrary to select one 
particular individual as the type, and to claim either that all others are 
not really members of the- class or that they are modified forms of the 
type. This method of procedure would contravene the fundamental 
concept of variability, for a variable comprises all the representatives 
of a class, the individual components of which are only defined in so far 
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as they are members of the class — this in contradistinction to constants 
which are assumed to be completely defined and must therefore be the 
same in every case. 

As soon as these principles are held clearly in mind, it appears that 
the ordinary definition of arithmetical difference is not applicable in our 
case. The term " difference " as applied to variables does not mean the 
same as the term " difference " applied to constants. Variables cannot 
be brought into a measurable series by the same means that we use for 
constants which may be compared by means of an arbitrary standard 
that is also constant. 

The problem before us is how to overcome these difficulties — how to 
give a definite meaning to the differences between variables and make 
these differences measurable. 

The question has been treated by G. H. Mollison J and by J. Czeka- 
nowski. 2 Mollison has discussed particularly the problem of dif- 
ferences between two types, and he gives an arbitrary formula which 
later on was modified by St. Poniatowski, 3 according to whom the 
difference 

D = (ai— a 2 ) — -. 

In this formula the distance between the two averages is expressed as 
a multiple of the two values of <r, and, as a measure of the distance, is 

assumed the value ClCi . The problem, however, can hardly be 

ffl + O-2 

solved by such arbitrary methods. 

In the following pages I shall discuss some possible approaches to the 
problem. 

What we call difference in this case is not by any means an arith- 
metical difference; it is a judgment of the degree of dissimilarity of 
two series. If two series are so far apart that notwithstanding their 
variability they do not overlap, they are entirely dissimilar. If they 
do overlap they will be the more dissimilar, the less the amount of over- 
lapping. In this sense we may say that two pairs of series in which 
the amount and character of overlapping are equal will be equally 
dissimilar. While we may thus determine equality of dissimilarity we 
are not in a position to determine quantitatively the degrees of dis- 
similarity. 

In treating this problem we may first of all explain the meaning of 
similarity and dissimilarity by means of a few examples. Let us as- 

1 Morphologisches Jahrbuch, Vol. 42, p. 79. 

2 Korrespondenz-Blatt der Deutschen anthropologischen Geselhchaft, Vol. 40. 

3 Archill filr Anthropologic, N. F. Vol. X, p. 274. 
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sume that a pure negro and a pure white population are to be compared. 
The types are so distinct in all their features that in comparing them we 
should emphasize simply their dissimilarities. Now let us assume that 
a third community is added, consisting perhaps of baboons. It ap- 
pears at once that our point of view would be shifted from a considera- 
tion of dissimilarities between negroes and whites to the similarities 
which they have in common as compared with the baboon, and their 
similarities will appear to us now under a new angle and as of different 
value. 

When we compare a group of blond, blue-eyed North Europeans 
with dark complexioned, brown-eyed South Europeans, their dissimilar- 
ities are the most striking feature. If we add a negro community to 
these two groups, the similarities between the North and South Euro- 
peans would be much more prominently in our minds. We may ob- 
serve the same changing attitude when we speak of family resemblances, 
or similarities. When we consider the children of a family, entirely by 
themselves, without any reference to any other family, they will appear 
to us as dissimilar. If the family has a particular characteristic fea- 
ture, let us say, for instance, a long narrow nose, which all the children 
have to a greater or less extent, this will become the feature which 
makes them similar as compared to the rest of the population. 

It is, therefore, clear that the concept of the degree of similarity 
depends upon the characteristics of all the groups that are under con- 
sideration and will change with the groups that are being compared. 

In investigations on heredity it has been customary to determine the 
degree of similarity by means of the coefficient of correlation. When, 
for instance, parents and offspring are compared, the coefficient of 
correlation between the two will indicate the degree of their similarity. 
There is a biological relation between parent and offspring. The 
average form of the offspring is determined by the degree to which the 
parent differs from the average of the population to which he belongs. 
In marriage we may have selective mating through which the forms of 
two parents may be correlated. When the husband differs from the 
average of the population by a certain amount his wife will differ by a 
correlated amount. In both of these cases there is a functional relation 
between the two values. The distinguishing feature of fraternal corre- 
lation is that we are dealing with a natural group in which there is 
no true functional relation whatever. In a very large fraternity the 
bodily form of one member does not influence in any way either the 
average body form of the rest of the fraternity or the distribution of the 
individual forms. The fundamental equation * of correlation 

[y] = qx. 

1 Brackets indicate averaging. 
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has no meaning in this case because the coefficient of regression q will 
approximate zero — the more so the larger the fraternity. This is due to 
the fact that the members of the fraternity are all members of the same 
variable class, while in all the other cases previously noted we are deal- 
ing with relations between different classes. Fraternal correlation 
originates only in a population in which the fraternities represent dif- 
ferent types. If all the families had the same average value there 
would be no correlation and no similarity between brothers. The 
greater the heterogeneity of the population, the greater will be the 
correlation and similarity between members of a fraternity. If we call 
the coefficient of fraternal correlation r, the deviation of individuals 
from the average of the whole series x, the deviation of the family 
average from the general average of the population d, then for families 
of n children, 

/XX+X2+XS+ . . . +x„ s 



(1) 



«[#]. 



-\ n 

If the standard deviation of all the values of d is s and the standard 
deviation of all the values of x is <r, then it follows from (1) that 



(2) 
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number of n, 
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When all the families are alike, s is zero, and therefore r is also zero. 
The greater is s, that is the greater the diversity of family types, the 
larger is r. 

If we assume the variabilities of the fraternities to be the same for all 
families and to be expressed by Si, then for large numbers of n, 

and 

1 

r= . 

s 2 

Exactly the same considerations may be made for racial types. A 
local variety may be considered as a fraternal group. The term r will 
then be a measure of the heterogeneity of the people or of the dis- 
similarity of the groups. 1 

1 In another place ("On the Varieties of Lines of Descent Represented in a Population," American 
Anthropologist, NS. Vol. 18, pp. 1-9), I have shown how the heterogeneity of a population may be 
determined according to this method even in those cases in which each fraternity consists of a small 
number of individuals, and I have given a number of examples of heterogeneous and homogeneous 
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Thus we are led again to the conclusion that similarity has a meaning 
only when we consider a group as a whole, and that it is not possible to 
employ the terms "similarity" or "dissimilarity" in such a way that 
they have the same meaning for all combinations of groups. I have 
calculated, according to the formula just given, on the basis of Livi's 
data, the heterogeneity of the Italian population with regard to their 
cephalic index and find it to be 0!34, while it is only 0.12 for the Italian 
population north of the Appenines. 

It does not seem that we can obtain in this manner a satisfactory 
measure for differences in the degree of similarity between single 
groups. 

The problem of the definition of similarities has been treated fully in 
experimental psychology. Weber's law is actually an expression of the 
observation that the differences between two pairs of sensations are 
judged to be equal. In this case the basis of empirical determination 
of similarity is the probability of mistaking one difference for another. 
It is not, as was originally assumed, a measure of quantitative value of 
the sensation itself. This concept of similarity holds good not only in 
the case of simple sensations but also in the field of more complex ex- 
perience. We may speak of similarity, or of the probability of failing 
to differentiate, for the most diverse kinds and the most complex forms 
of mental experience. The problem that we are discussing here has 
suggested itself in every comparative study of mental processes. 

In an analogous manner we may define the degree of similarity as the 
probability of mistaking an individual that belongs to one group for a 
member of any of the other groups concerned. The degree of dis- 
similarity may then be determined by the probability of recognizing an 
individual as belonging to his own group. 

If we call pi, the probability of occurrence of a certain measurable 

populations. It may be remarked here that in a mixed population for which no information as to 
descent is available, heterogeneity will ordinarily appear when the measurements of two organs are 
correlated. If various families represent different types with regard to two measurements which are not 
correlated; if the averages of these types have deviations d and d' from the general average, and if the 
individuals deviate from the respective averages of their families by the amounts x and y, then their 
correlation may be expressed by 

r _ Ud+x)(d'+ y )] 
ax • ay 
and since x and y are assumed not to be correlated 

dd' 



ax ■ ay 
Calculating in this way the correlation between cephalic index and stature for the whole of Italy on the 
basis of livi's data, we find a correlation of -f-.l, while in a homogeneous series the correlation between 
stature and cephalic index is negative. The value +.1 is derived from observations of 300,000 indi- 
viduals, and is therefore quite certain. 

I have discussed this phenomenon in relation to the population of Paris in the American Anthropol- 
ogist, NS. Vol. 1, pp. 448-61. 
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form X, which is a member of a certain variable class; p 2 the probability 
of occurrence of the same form X as a member of a second class, etc. ; if, 
furthermore, n such classes occur in which the observation X has the 
probabilities of occurrence pi, p 2 , pz . . . p n and if the probabilities 
of occurrence of the classes are pi, p 2 , p% . . . p n ', then the 
probability of occurrence of the form X as a member of the series r is 
Prp/; and the total probability that any individual of measure X will 
occur in the aggregate of n series will be 

PiPl +P*Pi +P*P*' + ■ • ■ +PnPn- 

The probability that a certain value X among all the values of X be- 
longs to the series r is therefore 

VtVr 

Xpp'' 

Now let us assume that there be given n individuals of the form X 
whose provenience is not known, but who actually belong to the series r. 
If our knowledge of the aggregate is perfect, then the n given individ- 
uals must be assigned according to the principle that among n as- 
signments n must belong to the series r, the rest to the series not r. 

Spp' 
The probability of a correct assignment of a single observation is 
therefore 

VtPt' 

~Zpp' 

If, now, a series of observations of the class r is given, the probability 
of the occurrence of X in this series will be p r . 

Then, under the conditions just mentioned, the probability of the 
occurrence and at the same time of the correct assignment of X to the 
class r will be 

/q\ p _pr Pr 

W r *ir — - 7 • 

2p p' 
The probability that any one of all the values of X that do occur is 
correctly assigned, is therefore 

+00 



(4) 



J Spp' 



and. we may call this the. degree of dissimilarity between the series r 
and the aggregate of all the series that are being considered. It is 
obvious that in this manner the degrees of diversity for all the other 
individual series may be established, that they may be subtracted from 
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one another, and that in this manner differences in the degree of simi- 
larity may be determined. 

It may be pointed out that when three series are compared in this 
manner in pairs, the resultant values are not additive. If only series 
(1) and series (2), then series (1) and (3), then series (2) and (3) are 
considered, the sum of the difference between (1) and (2) plus that be- 
tween (2) and (3) will not be equal to the difference between (1) and 
(3). This is another expression of the observation made before that 
the meaning of similarity changes with the aggregate of the series that 
is being considered. 

It might also seem possible to arrange the single series in the order of 
their averages and to determine their dissimilarities step by step. Here 
the difficulty may arise that two succeeding averages may be nearly the 
same, while their variabilities may be quite different. Whenever this 
occurs quite an erroneous impression of the differences would be given. 
The reason for this difficulty lies in the fact that the difference as here 
defined depends upon the averages and variabilities of the single series, 
and that certain combinations of these result in the same degree of 
dissimilarity. 

In the case treated here the various series enter into the aggregate 
according to the number of individuals representing each series. It 
might be, for instance, that a large mass of material has been ac- 
cumulated for one group and that another group is known through the 
study of a very few individuals only. Our expression contains, there- 
fore, a weighting according to number which obscures the more general 
theoretical question. If the groups were known perfectly, then all 
would have equal weight, i. e., we should have to assume them to be 
represented by equal numbers. All the values p' would then be equal. 
This is the assumed condition under which most comparisons of types 
are actually made. We have, then, 

+00 

(5) p= !% dx 

— 00 

as a measure of dissimilarity and 

+C0 

(5*) p^i-J^dx 

— 00 

as a measure of similarity. 

Whether equation (5.) or equation (4) represents the conditions of a 
set of observations, or whether a form with still different weighting 
should be selected, depends upon the clarity of our concept of the 
characteristics of each group. If we assume each group as thoroughly 
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studied and therefore known in all its characteristics, then equation (5) 
will represent the conditions adequately. On the other hand, if we are 
impressed by the unclassified series as a whole, without detailed study 
of each group, and if we try to determine the similarities and dis- 
similarities on this basis, then formula (4) will correspond to the condi- 
tions of the investigation. If subjective elements are to be eliminated 
as far as possible, we must try to adjust conditions so that equation (5) 
can be applied. As a matter of fact, our judgment of similarity in all 
cases of this type is fluctuating; sometimes one group, sometimes an- 
other, is most prominently in our minds, and the actual assignments 
are therefore different from the two extreme forms (4) and (5) discussed 
before and may lie somewhere in between, or they may change with 
changing mental conditions. The more thorough our knowledge of 
each series, the closer will be the approach to form (5). 

The formulas here given present the inconvenience that the values 
obtained for similarity are the smaller, the larger the number of series 
forming the aggregate, so that when the number of similar series is very 
great the values of their similarities will be exceedingly small. 

In the final results it may appear that some of these series have the 
same degree of dissimilarity. If the averages and variabilities of these 
series are also indicative of identity, the series should be combined. 

It must be remembered that it is possible for a number of different 
distributions to result in the same amount of dissimilarity. Since every 
distribution depends at least upon two constants, average aDd standard 
deviation, there are whole sets of functions which will give us the same 
value for the total probability of mistaking a member of one series for a 
member of the rest of the aggregate. However, owing to the general 
likeness of forms of distribution, the occurrence of this event is im- 
probable. On the other hand, dissimilarity can occur only when dis- 
tributions are unlike. The minimum amount of dissimilarity is found 
when all the series are identical. If there are n series, the value of 

dissimilarity will be — . 
n 

All the dissimilarities have necessarily positive values. For this 
reason if one dissimilarity consists in the prevalence of high values, an- 
other one in the prevalence of low values, they ought to be distinguished, 
and the direction of dissimilarity may be indicated by opposite signs. 
It will, therefore, be necessary in every case to consult also the charac- 
teristic averages and distributions in order to give the proper sign to the 
values of dissimilarity. It must, however, be understood that the 
values are additive only without regard to the sign, since they all 
represent probabilities. 
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I have calculated on the basis of (5) the dissimilarities of the cephalic- 
index, stature, and hair color, in Italy, basing my calculations on Livi's 
material. 1 The following tables give the results of this calculation: 

TABLE I, CEPHALIC INDEX 



Dissimilarity- 



Average cephalic 
index 



Average cephalic index 

of individuals judged 

to belong to each 

series * 



Piemont . . 
Liguria. . . 
Lombardy 
Venice. . . 
Emilia . . . 

Tuscany . . 
Marche. . 
Umbria. . 
Lazio. . . . 
Abruzzi. . 

Campania 
Puglie 

Basilicata . 
Calabria. . 
Sicily .... 

Sardinia . . 



0.103 
(-) 0.061 
0.083 
0.088 
0.091 

(-) 0.063 

0.073 

0.074 

(-) 0.067 

(-) 0.065 

(-) 0.068 
(-) 0.080 
(-) 0.072 
(-) 0.110 
(-) 0.085 

(-) 0.152 



85.9 
82.3 
84.4 
85.0 
85.2 

82.3 
84.0 
84.1 
81.0 
81.9 

82.1 
79.8 
80.8 
78.4 
79.6 

77.5 



87.8 
82.3 
85.2 
86.2 
86.8 

82.4 
85.4 
85.5 
80.0 
81.7 

82.0 
78.4 
79.9 
75.4 
78.4 

74.9 



•Seep. 438. 



TABLE II 





Statube 


Hair Coloe 




Dissimilarity 


Average stature 


Average stature 
of individuals 

judged to belong 
to each series 


Dissimilarity 




0.0643 
0.0672 
0.0667 
0.0751 
0.0673 

0.0682 
(-) 0.0633 
(-) 0.0636 
(-) 0.0630 
(-) 0.0654 

(-) 0.0643 
(-) 0.0646 
(-) 0.0689 
(-) 0.0652 
(-) 0.0636 

(-) 0.0754 


164.9 
165.5 
165.3 
166.6 
165.3 

165.6 
163.8 
164.2 
164.3 
163.2 

163.5 
163.5 
162.6 
163.1 
163.5 

161.9 


165.7 
166.8 
166.6 
168.6 
166.8 

167.1 
163.5 
164.4 
164.4 
162.3 

163.0 
163.0 
161.3 
162.3 
162.9 

160.4 


0.070 




0.069 




0.066 




0.072 




0.066 




0.070 




0.065 




0.064 




(-) 0.068 




0.068 




(-) 0.065 
(-) 0.064 






(-) 0.068 




(-) 0.066 
(-) 0.065 






(-) 0.077 







I have calculated these.tables from the summary of observations for 
sixteen provinces. The minimum of dissimilarity is therefore 0.06275, 
and it will be seen that a number of provinces differ so little from the 

'Eodolfo Livi, Anlropometria Mililare. pp. 248-49; 266-67; 286-87. 
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mass of the Italian material that they have no distinguishing charac- 
teristics. On the other hand, Piemont and Sardinia show, com- 
paratively, very great differences. 

If a number of independent traits are examined together the proba- 
bility of a correct assignment increases. When the probability of 
correct assignments for various traits are pi, p 2 . . . p r , then the 
probability of an incorrect assignment considering all the characteristics 
together will be 

(1-Pi)(l-P0 • • • (1-Pr). 

When all the series are identical, and only one trait is studied, the 

n—1 

probability of an incorrect judgment will be , where n is the num- 

n 

ber of series compared. For r independent traits, the probability of an 

incorrect judgment in identical series will be ( Y. This value 

represents the maximum probability of an incorrect judgment. The 
minimum probability of an incorrect judgment occurs in cases of 
absolute difference of all individuals, and is zero. 

The values of probabilities derived from a combination of several 
characteristics may be made more comparable by determining the 

ratio between f Y and the observed value of the product (1— pi) 

(l-p 2 ) . . . (l-p r ). 

Applying this to the sixteen provinces of Italy, tabulated on page 435, 
we find the maximum probability of misjudgment for a single charac- 
teristic to be ( — J, or 0.937; for two characteristics combined, (y^) , 

(15\ 3 
— J , or 0.824. 

Table III gives the absolute probabilities of misjudgments (column 
I) and their ratio to the maximum probability of a misjudgment 
based on a single character, namely, 0.937 (column II in the first 
three double columns) . Column II in the fourth double column gives 
the ratio between the maximum probability of a misjudgment based on 
a comparison of two characters and the observed probability of a mis- 
judgment for head index and stature. Column II in the last double 
column gives the same for three characters, namely, head index, 
stature, and hair color. 

In this series Piemont and Sardinia appear as the most distinct 
areas. It seems then justifiable to compare these alone in order to 
determine their dissimilarity. According to the same method, applied 
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TABLE III 



Piemont. . 
Liguria. . . 
Lombardy 
Venice. . . 
Emilia . . . 

Tuscany. . 
Marche. . . 
Umbria. . 
Lazro .... 
Abruzzi. . 

Campania . 
Pughe 
Basilica ta . 
Calabria. . 
Sicily. . . . 

Sardinia. . 



Cephalic index 



II 



0.897 
0.939 
0.917 
0.912 
0.909 

0.937 
0.927 
0.926 
0.933 
0.935 

0.932 
0.920 
0.928 
0.890 
0.915 



0.957 
1.000 
0.979 
0.973 
0.970 

1.000 
0.989 
0.988 
0.996 
0.998 

0.995 
0.982 
0.990 
0.950 
0.977 



0.848 0.905 



Stature 



II 



0.936 
0.933 
0.933 
0.925 
0.933 

0.932 

0.937 
0.936 
0.937 
0.935 

0.936 
0.935 
0.931 
0.935 
0.9,36 



0.999 
0.996 
0.996 
0.987 
0.996 

0.995 
1.000 
0.099 
1.000 
0.998 

0.999 
0.998 
0.994 
0.998 
0.999 



0.925 0.987 



Hair color 



II 



0.930 
0.931 
0.934 
0.928 
0.934 

0.930 
0.935 
0.936 
0.932 
0.932 

0.935 
0.936 
0.932 
0.934 
0.935 



0.993 
0.994 
0.997 
0.990 
0.997 

0.993 
0.998 
0.999 
0.995 
0.995 

0.998 
0.999 
0.995 
0.997 
0.998 



0.923 0.985 



Cephalic index 


and stature 


I 


II 


0.840 


0.956 


0.876 


0.997 


0.855 


0.973 


0.844 


0.960 


0.848 


0.965 


0.873 


0.993 


0.869 


0.989 


0.867 


0.986 


0.874 


0.994 


0.874 


0.994 


0.872 


0.992 


0.860 


0.978 


0.864 


0.983 


0.832 


0.947 


0.856 


0.974 


0.784 


0.892 



Cephalic 

index, stature, 

and hair 



II 



0.781 
0.816 
0.799 
0.783 
0.792 

0.812 
0.813 
0.811 
0.815 
0.815 

0.815 
0.805 
0.805 
0.777 
0.800 



0.948 
0.990 
0.970 
0.950 
0.961 



985 
987 
984 
989 
989 



0.989 
0.977 
0.977 
0.943 
0.971 



0.724 0.879 



to these two territorial divisions alone, we find the dissimilarities as 
shown in Table IV. 

Thus, the probability of a misjudgment for Sardinians and Pie- 
montese for cephalic index alone is 0.420 of the probability for identical 
series; for cephalic index and stature it is 0.389; and for cephalic index, 
stature, and hair color it is 0.342 of the probability for identical series. 
Still more strictly speaking, we may express the last column in the 
following way: In attempting a classification of individuals belonging to 
two identical series for which three measurements are compared, (J) 3 
out of the total number, or 125 out of 1,000 individuals, will be mis- 
placed, but only 43 out of 1,000 are misplaced when comparing Sardin- 
ians and Piemontese. 

The method here outlined enables us to segregate the most distinc- 
tive groups from the mass of overlapping material that belongs to a 
large area, as we have segregated here the Piemontese and Sardinians. 

TABLE IV 



Cephalio index 

Stature 

Hair 



Cephalic index 



II 



0.210 0.420 
0.463 0.926 
0.438 0.976 



Cephalic index 
and stature 



II 



0.097 0.389 



Cephalic index, 

stature, and 

hair color 



II 



0.043 0.342 
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It will be noted that in our example Venice and Calabria are also 
strongly divergent in type. A study of the measurements of these 
provinces shows that Calabria may be combined with Sardinia, while 
Venice appears somewhat distinct from Piemont. Accordingly, 
Calabria and Sardinia should be united. By a gradual elimination of 
intermediate forms it is possible not only to segregate the most diver- 
gent types of an area but also to determine the degree of their similarity. 

In applying the fundamental thought underlying our considerations 
to the classification of mankind, we might ask ourselves which are the 
series for which the similarity or the probability of a misjudgment be- 
comes zero, and these might be considered as the present fundamental 
human types. A satisfactory solution of this problem must not be 
based on the consideration of a few standardized measurements, but the 
features to be studied must be selected after a careful investigation of 
what is most characteristic of each group. 

It is also feasible to find in this manner outstanding types of a 
definite area and to arrange them according to the degrees of their 
similarity. The interpretation of the similarity, whether due to mix- 
ture, environment, or other causes, is of course a purely biological 
problem for which the statistical inquiry furnishes the material but 
which cannot be solved by statistical methods. 

We have seen that the individuals of a definite bodily form are not all 
assigned by us to the group to which they belong. The impression 
which we receive of characteristic forms of a particular series depends 
upon the distribution and the forms of individuals which we assign to it. 
While their distribution does not conform to the exponential function, 
it does not differ fundamentally from the distribution of other variable 
phenomena, and for this reason our impression of the general charac- 
teristic form of the series is expressed by the average of individuals that 
we assign to it. This value is 

+00 



(6 » -j-W^ 

— 00 

or, if the values of p 1 are equal, 

+00 

(6*) a =JW dX - 

— 00 

This calculation for Italy has been carried out on the basis of the 
previous tables. The psychological impression of characteristic head 
forms of each province of Italy is given in the last column of the table 
on page 435, while the second columns contain the actual averages. 
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This table is suggestive because it shows that we receive an exaggerated 
impression of the characteristics of a series, because individuals that are 
similar to other series are assigned to them according to their appear- 
ance and are merged in the general background represented by the ag- 
gregate. Our impression, however, does not correspond to an actual 
type. This proves that the attempts to analyze a series according to 
similarities of individuals into a number of subtypes is methodologically 
not admissible, and that all subdivisions must be based on the study of 
the series as a whole, not upon selected types. 

The chief difficulty in the practical application of the method out- 
lined in the preceding pages is due to the facts that the degree of 
similarity depends upon the aggregate treated, and that there is no 
relation between the numerical values obtained for different aggre- 
gates. Not even the equality of differences between several given 
series need persist if new members are added to the aggregate or are 
taken away from it. 

In cases of continuous changes of a type from one extreme form to 
another, an artificial classification of the aggregate is unavoidable. By 
means of repeated adjustment equal degrees of similarity might be 
found according to the method outlined here, but the actual 
carrying out of such a plan offers serious difficulties. In such cases 
each series might be considered as a specialized form of the general ag- 
gregate and compared with it. The aggregate itself may, however, be 
established in two different ways. We may disregard the number of 
existing individuals, considering each morphological type contained in 
the aggregate as a unit. The units would then be given equal weight 
(i. e., equal numbers of cases). Or we may take the whole series as it 
exists at the present time, counting the total number of individuals 
that it contains, regardless of local types that may represent the same 
morphological form. By either of these methods we ascertain how 
dissimilar each morphological type is from the aggregate, but these 
values cannot be used to determine the mutual dissimilarities of the 
single series contained in the aggregate. When the types are combined 
according to the present actual number of individuals representing 
them, the most numerous type will appear least distinct from the 
average, merely on account of the large number of its members. This 
difficulty can hardly be avoided by comparing each series with the ag- 
gregate of the remaining series, because by this method the standard of 
comparison is always changing. On the other hand, the formation of 
the aggregate by giving equal weight to each morphological type entails 
the difficulty that we tried to avoid, namely, an arbitrary classification 
of the groups as a number of morphological types. 
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The problem may be approached in another manner. We may de- 
termine the frequency distribution of the differences between individ- 
uals belonging to one series and those belonging to all the other series 
of the aggregate. In this inquiry we have to determine the average 
difference between the representatives of one series and those of all the 
other series, and the variability of this difference. When the series are 
arranged in pairs, the differences between the averages are additive, but 
the variabilities are not comparable. The difference between a series 
(1) and a series (2) will have the variability ± v <7i 2 +<T2 2 , so that if one 
series is taken as the basis of comparison with all the others, the 
variabilities have a significance only in relation to the one series which 
has been arbitrarily selected as a standard. 

The interrelations between the series can be determined only when 

we consider any one series in relation to all the others. Assuming 

again that all the series are known, we may call the values of the 

particular series , 

ai+xi 

a^-\-Xi 



The differences between one series and all the others taken successively 

(ai — a 2 ) + (xi—x 2 ) 
(ai—a 3 ) + (xi-x 3 ) 



(a 1 -a n ) + (xi-x n ), 
or since there are n— 1 groups of this type, the average difference will be 

n Za 

n— 1 n— 1 

The mean square variability of this difference will be 

——A(xi—x 2 y+(x 1 —Xi) 2 + . . . +(x 1 —x n y>, 

n-2 , , 2<r 2 

or -<n+ . 

n— 1 n — 1 

We have, therefore, the characteristic difference between series (1) and 
the rest of the series 



1 ^n-1 



n 2a n-2 , , So- 2 
-ai- \ -<ri 2 + 



n— 1 n— 1 y n—l n—\ 



17] Measurement of Differences Between Variable Quantities 441 



If we assume a distribution according to the probability function, the 
differences may be made comparable by measuring them by their 
standard deviations. The difference between series (1) and all the 
other series taken conjointly will therefore be 

,_.. nai — 'Sa 



or 



V( w -l)(n-2(ri 2 -|-2<r 2 )' 
oi— [a] 



n 



n- 



"4 



H+nH 



k 2 ]- 



•ffi* 



n-l 



If n is sufficiently large, we may write approximately 
(8) oi^W, 

which is the difference between the average of series (1) and the average 
of the total aggregate including series (1) measured by its variability. 
From this value and by means of a table of the probability integral 
may be determined the probability that the difference between an 
individual belonging to series (1) and any other individuals of the ag- 
gregate has any selected value, particularly the probability that this 
difference may be greater or less than zero. 

The advantages of this definition of difference is the relative stability 
of the values obtained when the series are arranged in different groups. 
As an instance I give the calculation for Italy. Table V gives the aver- 
ages and variabilities of the cephalic index for each circondario. In 
each column are given the number of occurrences of the values of 
the average cephalic index by half units, as 76.0-76.4, 76.5-76.9, etc. 

TABLE V 



Variability 


Cephalic index 


76 


77 


78 


79 


80 


81 


82 


83 


84 


85 


86 


87 


3.00-3.24 
3.25-3.49 
3.50-3.74 
3.75-3.99 

4.00-4.24 
4.25-4.49 


.' i 


2 4 
2 1 

. 1 


i 2 

6 1 
1 2 

1 1 


4 i 

2 6 

1 3 

2 1 


. 1 
2 1 
6 4 
. 3 

2 1 


'. i 

2 6 
6 4 

. 3 

. 2 


.' 4 
3 4 
1 2 

1 2 


2 i 

4 1 

5 3 

2 1 
1 1 


3 1 
3 3 
6 4 

1 3 

2 1 
2 1 


5 2 

6 3 
6 3 

1 1 
. 1 


1 1 
4 2 
3 4 

. 2 


'. 2 



Considering each combination of an index and a variability occurring 
in Table V as a single morphological group, we obtain the values 
given in Table VI for formula (8) for 84 such groups. Considering 
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the population in its actual distribution, i. e., counting each morpho- 
logical group according to the actual number of its occurrence, we get 
for 206 groups the figures given in Table VII. For whole units of 
cephalic index and half units of variability the corresponding figures 
for 32 morphological units are given in Table VIII; and counting the 
actual frequencies of the single groups we obtain the figures given in 
Table IX. 

The problems take a slightly different form when populations are 
compared with regard to features that occur in a certain percentage of 
individuals and are absent in the rest. If, for instance, one population 
consists of 15 per cent negroes and 85 per cent whites, another one of 30 
per cent negroes and 70 per cent whites, it might seem that the difference 
could be stated simply as a difference of 15 per cent, but obviously the 
dissimilarity of these two types of population would not be the same as 
in another pair in which we have 40 per cent negroes and 60 per cent 
whites in one and 55 per cent negroes and 45 per cent whites in the 
other. In the latter case the populations would seem more alike to us 
than in the former case. This difficulty is still more pronounced if 
there are present not merely two types but a larger number of types in 
varying proportions. In all these cases we may apply the same 
methods which we used for the determination of similarity of measur- 
able quantities. 

The first method of determining similarities may be applied to this 
problem without change. If a certain feature has the probabilities 
P\, P2, J>3 . . . p n in a number of series, then the frequency of its 
occurrence in any one of then series will be P1+P2+PS+ • • • +Pn> 
The probability that a single individual of series r whose provenience is 

not known is assigned correctly will be, therefore, ^- , and if the proba- 
bility of occurrence of an individual of this series is pp', then the total 
probability of a correct assignment will be * , , or if we disregard the 

number of cases in each series, — . The sum of these values for all the 

'2p 

distinct traits occurring in the population, that is, 2 ^" , will be the 

total probability of a correct assignment. In a case in which only two 
forms constitute a population, and comparing two populations only, we 
have accordingly for equal numbers of p' 

Pi* . + . V-ri* 



P1+P2 1—pi+l—pt 



Frequency 




P1P2 times 





Pi(l-P«) 


+1 


(1-Pi)p« 


-l 


(l-pO(i-Pi) " 
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According to the second method we have to determine the average 
difference between individuals of the several series and its variability. 
If in a series the probability of the occurrence of a phenomenon is p, 
then those individuals in which it does occur, and who therefore have 
the probability of occurrence 1, have a deviation from the probability of 
the series 1 — p; those in whom it does not occur and who therefore have 
the probability of occurrence 0, have a deviation from the probability 
of the series —p. Of the former there are p cases, of the latter 1— p 
cases. We have, therefore, the following combinations: 

The phenomenon is 
present in series (1), present in (2) 
present in series (1), absent in (2) 
absent in series (1), present in (2) 
absent in series (1), absent in (2) 
The last column represents the average differences between the prob- 
abilities of occurrence for each combination. 
The average is therefore 

Pi0--P2)-Pi(l-Pi)=pi-P2- 
We have, therefore, the following deviations from the averages and 
their squares : 

Frequencies Difference of each series from Squares 

average (pi — p?) of differences 

pxp 2 times -(P1-P2) (P1-P2) 2 

(I-P1XI-P2) " -(.Pi- P2) ( Pi-Pd 2 

Pi(l-Pa) " l-(pi-p 2 ) (l- pi-p 2 ) 2 

Pa(l-Pi) " -1-(Pi-P2) (I+P1-P2) 2 

and by addition 

<r 2 = pi(l-pi)+p 2 (l — P2). 
When we compare in this manner p\ with all the other series, we obtain 
as before 

n Sp 

a= pt- 



n—1 n—\ 

n 



n-\ 



(Pil-M) 



./«-2 ,- , , Sp(l-p) 

a = ± V rPi(l-Pi)+ — / 

n— 1 n— 1 



= ±Jpi(l-Pi)+[p(l-p)]+ [pil - p)] - M1 - pi) . 
y ■ n—\ 



Since vp(l— p) is the standard deviation for p, the formula appears 
identical with the one previously derived. 



